March 15, 2017



Welcome

BEM VINDOS

These slides: http://www.databrew.cc/cism

This course

  • 6 classes
  • 6 homework assignments
  • 1 final exam
  • 1 certificate of completion

Rules

  • Be on time (class starts at 15:00 exactly)
  • Do your homework
  • Ask for help

5 reasons to use R

Reason 1: It's free

5 reasons to use R

Reason 2: It's "open source"

mxcursos.com

5 reasons to use R

Reason 3: It's beautiful

https://www.r-bloggers.com/a-map-of-the-world-by-tweets/

5 reasons to use R

Reason 3: It's beautiful

http://www.isric.org/sites/default/files/image01.png

5 reasons to use R

Reason 3: It's beautiful

https://ryouready.wordpress.com/2015/04/14/beautiful-plots-while-simulating-loss-in-two-part-procrustes-problem/

5 reasons to use R

Reason 3: It's beautiful

http://asbcllc.com/visualizations/weather/gotham_2014/plot.svg

5 reasons to use R

Reason 4: It's powerful

https://medium.com/@itamargilad/analyze-your-data-like-a-pro-with-r-e5e89a64564a#.thq1nkp65

5 reasons to use R

Reason 5: It's fun

http://www.hlgjyl888.com/group/fun-pictures/

Installation and set-up

Getting familiar with RStudio

First code

Let's write some code!

2 + 2

First code

Let's write some code!

2 + 2
[1] 4

First code

Let's write some code!

x <- c(1,2,3,4,5)

First code

Let's write some code!

x
[1] 1 2 3 4 5

First code

Let's write some code!

barplot(x)

Packages

http://www.rgbstock.com/bigphoto/mB1JGWC/Box

Packages

A "package" is simply a collection of code written by someone else.

It's what makes R powerful, but also confusing.

Installing packages

You only have to install a package one time.

install.packages('dplyr')
install.packages('devtools')
devtools::install_github('databrew/databrew')
devtools::install_github('joebrew/cism')

Using packages

You have to use the library function every time you use a package.

library(databrew)
library(cism)
library(sp)

Writing library just means "I am going to use this package".

Using packages

Since we've already written library(cism), now we can use some tools from the cism package.

A map of Mozambique

plot(moz0)

A map of Manhiça

plot(man3)

Creating objects

a <- 1
a + 3

Creating objects

a <- 1
a + 3
[1] 4

Creating objects

Let's create an object called "ages", with the age of everyone

ages <- c()

Exploring objects

How do we view our ages object?

ages

Exploring objects

How do we view our ages object?

ages
 [1] 30 26 31 39 45 27 28 22 19 30 35

Exploring objects

How do we view just the first element of our ages object?

ages[1]

Exploring objects

How do we view just the first element of our ages object?

ages[1]
[1] 30

Exploring objects

How do we sort our ages object?

sorted_ages <- sort(ages)
sorted_ages
 [1] 19 22 26 27 28 30 30 31 35 39 45

Exploring objects

How do we get the minimum, maximum, average age?

min(ages)
max(ages)
mean(ages)

Exploring objects

min(ages)
[1] 19
max(ages)
[1] 45
mean(ages)
[1] 30.18182

Visualizing objects

How do we visualize our ages object?

hist(ages)

Multi-dimensional objects

Previously, we looked at a one dimensional object: ages.

But most data is two dimensional: rows and columns.

This is called a data frame.

Let's play around with some real data.

Multi-dimensional objects

Let's create a simple dataframe

www.databrew.cc/frangos.csv

frangos <- databrew::frangos

Frangos

head(frangos)
# A tibble: 6 x 4
  diet  chick  days grams
  <chr> <int> <dbl> <int>
1 corn      1 0.192    42
2 corn      1 1.01     51
3 corn      1 4.52     59
4 corn      1 6.72     64
5 corn      1 8.14     76
6 corn      1 9.11     93

Frangos

Let's explore.

Brackets: []

Let's filter

Let's visualize

Workflow, projects

  1. Always save your scripts.

  2. Never save your "workspace".

  3. Work in "projects"



First analysis

Getting data

We're going to use the cism package to get weather data for the FQMA weather station (Maputo).

library(cism)
??get_weather

Getting data

weather <- get_weather(station = 'FQMA', 
                       start_year = 2010,
                       end_year = 2016)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)
NULL

Some questions on our data

  1. How many rows are in our data?
  2. How many columns?
  3. What are the names of the columns?

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
# 2. How many columns?
ncol(weather)
# 3. What are the names of the columns?
colnames(weather)

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
NULL

Some questions on our data

# 2. How many columns?
ncol(weather)
NULL

Some questions on our data

# 3. What are the names of the columns?
colnames(weather)
NULL

Questions about specific columns

  1. What is the date range?
  2. What is the maximum temperature?
  3. What is the minimum temperature?
  4. What is the average temperature?

Questions about specific columns

# 4. What is the date range?
range(weather$date)
# 5. What is the maximum temperature?
max(weather$temp_max)
# 6. What is the minimum temperature?
min(weather$temp_min)
# 7. What is the average temperature?
mean(weather$temp_mean)

Questions about specific columns

# 4. What is the date range?
range(weather$date)
[1]  Inf -Inf

Questions about specific columns

# 5. What is the maximum temperature?
max(weather$temp_max, na.rm = TRUE)
[1] -Inf

Questions about specific columns

# 6. What is the minimum temperature?
min(weather$temp_min, na.rm = TRUE)
[1] Inf

Questions about specific columns

# 7. What is the average temperature?
mean(weather$temp_mean, na.rm = TRUE)
[1] NA

Visualizing our data

Which variables do we have which are numeric and continuous?

How can we visualize these?

Visualizing our data

Which variables do we have which are numeric and continuous?

  • temp_max, temp_mean, temp_min, etc…

How can we visualize these?

  • boxplot, histogram

Boxplot

boxplot(weather$temp_mean)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs): need finite 'ylim' values

Histogram

hist(weather$temp_mean)
Error in hist.default(weather$temp_mean): 'x' must be numeric

Creating new variables

Let's create a variable called "hot"

Creating new variables

weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')

Creating new variables

head(weather)

Creating new variables

head(weather)
$hot
logical(0)

Exploring our new variable

table(weather$hot)
hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)

Exploring our new variable

hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)
barplot(hot_table)
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'))
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'),
        border = 'darkgrey')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Multi-variable plots

Let's create a plot of date (x-axis) and the maximum temperature

Multi-variable plots

Let's create a plot of date (x-axis) and the maximum temperature

plot(weather$date,
     weather$temp_max)
Error in plot.window(...): need finite 'xlim' values

Multi-variable plots

Let's make our plot prettier

Multi-variable plots

Let's make our plot prettier

plot(weather$date,
     weather$temp_max,
     type = 'l',
     col = 'red',
     xlab = 'Date',
     ylab = 'Maximum temperature',
     main = 'Maximim temperature in Maputo')
Error in plot.window(...): need finite 'xlim' values



Revuew

Getting data

We're going to use the cism package to get weather data for the FQMA weather station (Maputo).

library(cism)
??get_weather

Getting data

weather <- get_weather(station = 'FQMA', 
                       start_year = 2010,
                       end_year = 2016)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)

Exploring data

Now that we have our weather data, we can look at it.

head(weather)
NULL

Some questions on our data

  1. How many rows are in our data?
  2. How many columns?
  3. What are the names of the columns?

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
# 2. How many columns?
ncol(weather)
# 3. What are the names of the columns?
colnames(weather)

Some questions on our data

# 1. How many rows are in our data?
nrow(weather)
NULL

Some questions on our data

# 2. How many columns?
ncol(weather)
NULL

Some questions on our data

# 3. What are the names of the columns?
colnames(weather)
NULL

Questions about specific columns

  1. What is the date range?
  2. What is the maximum temperature?
  3. What is the minimum temperature?
  4. What is the average temperature?

Questions about specific columns

# 4. What is the date range?
range(weather$date)
# 5. What is the maximum temperature?
max(weather$temp_max)
# 6. What is the minimum temperature?
min(weather$temp_min)
# 7. What is the average temperature?
mean(weather$temp_mean)

Questions about specific columns

# 4. What is the date range?
range(weather$date)
[1]  Inf -Inf

Questions about specific columns

# 5. What is the maximum temperature?
max(weather$temp_max, na.rm = TRUE)
[1] -Inf

Questions about specific columns

# 6. What is the minimum temperature?
min(weather$temp_min, na.rm = TRUE)
[1] Inf

Questions about specific columns

# 7. What is the average temperature?
mean(weather$temp_mean, na.rm = TRUE)
[1] NA

Visualizing our data

Which variables do we have which are numeric and continuous?

How can we visualize these?

Visualizing our data

Which variables do we have which are numeric and continuous?

  • temp_max, temp_mean, temp_min, etc…

How can we visualize these?

  • boxplot, histogram

Boxplot

boxplot(weather$temp_mean)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs): need finite 'ylim' values

Histogram

hist(weather$temp_mean)
Error in hist.default(weather$temp_mean): 'x' must be numeric

Creating new variables

Let's create a variable called "hot"

Creating new variables

weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')

Creating new variables

head(weather)

Creating new variables

head(weather)
$hot
logical(0)

Exploring our new variable

table(weather$hot)
hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)

Exploring our new variable

hot_table <- table(weather$hot)
hot_prop_table <- prop.table(hot_table)
barplot(hot_table)
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'))
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Making our plot prettier

barplot(hot_table,
        main = 'Hot days in Maputo',
        ylab = 'Number of days',
        xlab = 'Temperature',
        col = c('red', 'blue'),
        border = 'darkgrey')
Error in plot.window(xlim, ylim, log = log, ...): need finite 'xlim' values

Multi-variable plots

Let's create a plot of date (x-axis) and the maximum temperature

Multi-variable plots

Let's create a plot of date (x-axis) and the maximum temperature

plot(weather$date,
     weather$temp_max)
Error in plot.window(...): need finite 'xlim' values

Multi-variable plots

Let's make our plot prettier

Multi-variable plots

Let's make our plot prettier

plot(weather$date,
     weather$temp_max,
     type = 'l',
     col = 'red',
     xlab = 'Date',
     ylab = 'Maximum temperature',
     main = 'Maximim temperature in Maputo')
Error in plot.window(...): need finite 'xlim' values



Where is joe?

Where is Joe

We're going to analyze where Joe is, using data from google. The data is part of the databrew package.

# Load package
library(databrew)

# Get data
joe <- joe

Where is Joe

Let's have a look at the structure of our data.

head(joe)
        date                time longitude  latitude velocity altitude
1 2017-03-13 2017-03-13 11:08:06  32.79699 -25.40760       NA       NA
2 2017-03-13 2017-03-13 11:06:01  32.79699 -25.40760       NA       NA
3 2017-03-13 2017-03-13 11:05:32  32.80439 -25.40608       NA       NA
4 2017-03-13 2017-03-13 11:03:03  32.80439 -25.40608       NA       NA
5 2017-03-13 2017-03-13 11:01:03  32.80545 -25.40844       NA       NA
6 2017-03-13 2017-03-13 11:00:16  32.80545 -25.40779       NA       NA
  heading accuracy
1      NA     2500
2      NA     2500
3      NA     1899
4      NA     1899
5      NA      400
6      NA      699

Where is Joe

Let's filter our data so that it only contains observations for the period from March 7-13.

joe_filtered <- joe[joe$date >= '2017-03-07' &
                      joe$date <= '2017-03-13',]

Where is Joe

Now let's use the cism package to plot Manhiça.

library(cism)
library(sp)
manhica <- man3
plot(manhica)

Where is Joe

The databrew package has a nice function called visualize_location. Let's try it out

?visualize_location

Where is Joe

visualize_location(x = joe_filtered,
                   spdf = manhica)

Where is Joe

Let's also try with an interactive map

visualize_location(x = joe_filtered,
                   use_leaflet = TRUE)